349 research outputs found

    The Power and Limitations of Uniform Samples in Testing Properties of Figures

    Get PDF
    We investigate testing of properties of 2-dimensional figures that consist of a black object on a white background. Given a parameter epsilon in (0,1/2), a tester for a specified property has to accept with probability at least 2/3 if the input figure satisfies the property and reject with probability at least 2/3 if it does not. In general, property testers can query the color of any point in the input figure. We study the power of testers that get access only to uniform samples from the input figure. We show that for the property of being a half-plane, the uniform testers are as powerful as general testers: they require only O(1/epsilon) samples. In contrast, we prove that convexity can be tested with O(1/epsilon) queries by testers that can make queries of their choice while uniform testers for this property require Omega(1/epsilon^{5/4}) samples. Previously, the fastest known tester for convexity needed Theta(1/epsilon^{4/3}) queries

    Tolerant Testers of Image Properties

    Get PDF
    We initiate a systematic study of tolerant testers of image properties or, equivalently, algorithms that approximate the distance from a given image to the desired property (that is, the smallest fraction of pixels that need to change in the image to ensure that the image satisfies the desired property). Image processing is a particularly compelling area of applications for sublinear-time algorithms and, specifically, property testing. However, for testing algorithms to reach their full potential in image processing, they have to be tolerant, which allows them to be resilient to noise. Prior to this work, only one tolerant testing algorithm for an image property (image partitioning) has been published. We design efficient approximation algorithms for the following fundamental questions: What fraction of pixels have to be changed in an image so that it becomes a half-plane? a representation of a convex object? a representation of a connected object? More precisely, our algorithms approximate the distance to three basic properties (being a half-plane, convexity, and connectedness) within a small additive error epsilon, after reading a number of pixels polynomial in 1/epsilon and independent of the size of the image. The running time of the testers for half-plane and convexity is also polynomial in 1/epsilon. Tolerant testers for these three properties were not investigated previously. For convexity and connectedness, even the existence of distance approximation algorithms with query complexity independent of the input size is not implied by previous work. (It does not follow from the VC-dimension bounds, since VC dimension of convexity and connectedness, even in two dimensions, depends on the input size. It also does not follow from the existence of non-tolerant testers.) Our algorithms require very simple access to the input: uniform random samples for the half-plane property and convexity, and samples from uniformly random blocks for connectedness. However, the analysis of the algorithms, especially for convexity, requires many geometric and combinatorial insights. For example, in the analysis of the algorithm for convexity, we define a set of reference polygons P_{epsilon} such that (1) every convex image has a nearby polygon in P_{epsilon} and (2) one can use dynamic programming to quickly compute the smallest empirical distance to a polygon in P_{epsilon}. This construction might be of independent interest

    Fold classification based on secondary structure – how much is gained by including loop topology?

    Get PDF
    BACKGROUND: It has been proposed that secondary structure information can be used to classify (to some extend) protein folds. Since this method utilizes very limited information about the protein structure, it is not surprising that it has a higher error rate than the approaches that use full 3D fold description. On the other hand, the comparing of 3D protein structures is computing intensive. This raises the question to what extend the error rate can be decreased with each new source of information, especially if the new information can still be used with simple alignment algorithms. We consider the question whether the information about closed loops can improve the accuracy of this approach. While the answer appears to be obvious, we had to overcome two challenges. First, how to code and to compare topological information in such a way that local alignment of strings will properly identify similar structures. Second, how to properly measure the effect of new information in a large data sample. We investigate alternative ways of computing and presenting this information. RESULTS: We used the set of beta proteins with at most 30% pairwise identity to test the approach; local alignment scores were used to build a tree of clusters which was evaluated using a new log-odd cluster scoring function. In particular, we derive a closed formula for the probability of obtaining a given score by chance.Parameters of local alignment function were optimized using a genetic algorithm. Of 81 folds that had more than one representative in our data set, log-odds scores registered significantly better clustering in 27 cases and significantly worse in 6 cases, and small differences in the remaining cases. Various notions of the significant change or average change were considered and tried, and the results were all pointing in the same direction. CONCLUSION: We found that, on average, properly presented information about the loop topology improves noticeably the accuracy of the method but the benefits vary between fold families as measured by log-odds cluster score
    • …
    corecore